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Abstract 

In part I, we reviewed how Shannon's classical notion of capacity is not sufficient to characterize a noisy 
communication channel if the channel is intended to be used as part of a feedback loop to stabilize an unstable 
scalar linear system. While classical capacity is not enough, a sense of capacity (parametrized by reliability) called 
j^'i "anytime capacity" is both necessary and sufficient for channel evaluation in this context. The rate required is the 

f^i ' log of the open-loop system gain and the required reliability comes from the desired sense of stability. Sufficiency 

is maintained even in cases with noisy observations and without any explicit feedback between the observer and 
the controller. This established the asymptotic equivalence between scalar stabilization problems and delay-universal 
O . communication problems with feedback. 

Here in part II, the vector-state generalizations are established and it is the magnitudes of the unstable eigenvalues 
that play an essential role. To deal with such systems, the concept of the anytime rate-region is introduced. This is 
\Q ' the region of rates that the channel can support while still meeting potentially different anytime reliability targets for 

parallel message streams. All the scalar results generalize on an eigenvalue by eigenvalue basis. When there is no 
explicit feedback of the noisy channel outputs, the intrinsic delay of the unstable system tells us what the feedback 
delay needs to be while evaluating the anytime-rate-region for the channel. An example involving a binary erasure 
^ ' channel is used to illustrate how differentiated service is required in any separation-based control architecture. 

Index Terms 

Real-time information theory, reliability functions, control over noisy channels, differentiated service, feedback, 
' anytime decoding 
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The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisy 

communication link 
Part II: vector systems 



One of Shannon's key contributions was the idea that bits could be used as a single universal currency for 
communication. For a vast class of point-to-point applications, the communication aspect of the problem can be 
reduced to transporting bits reliably from one point to another where the required sense of reliability does not depend 
on the application. The classical source/channel separation theorems justify a layered communication architecture 
with an interface that focuses primarily on the message rate. Rate has the advantage of being additive in nature 
and so multiple applications can be supported over a single link by simple multiplexing of the message streams. 
This paradigm has been so successful in practice that researchers often assume that it is always valid. 

Interactive applications pose a challenge to this separation based paradigm because long delays are costly. Part I 
of this paper [1] studies the requirements for the scalar version of the interactive application illustrated in Figure [T] 
stabilization of an unstable linear system with feedback that must go through a noisy communication channel. It 
turns out that message rate is not the only relevant parameter since the underlying noisy channel must also support 
enough anytime -reliability to meet the targeted sense of stability. However, the architectural implications of this 
result are unclear in the scalar case since there is only one message stream. 

To better understand the architectural requirements for interactivity in a well defined mathematical setting, this cor- 
respondence considers the stabilization of linear systems with a vector-valued state. Prior work on communication- 
limited stabilization problems had also considered such vector problems from a source coding perspective. [2] 
showed that the minimum rate required is the sum of the logs of the magnitudes of the unstable eigenvalues 
and [3] extends the result to certain classes of unbounded driving disturbances. The multiparty case has begun to 
be addressed in the control community under the assumption of noiseless channels [4], [5]. The sequential rate- 
distortion bounds calculate the best possible control performance using a noisy channel that is perfectly matched 
to the unstable open-loop system while being restricted to having a specified Shannon capacity [6], [7]. Thus, the 
prior necessary conditions on stabilization are only in terms of Shannon capacity and the prior sufficient conditions 
required noiseless channels. 

The model of vector valued linear control systems is introduced in Section [TT] and the main results are stated. 
Before going into the proofs, the significance of these results is demonstrated through an extended example in 
Section [VT] involving the stabilization of a vector-valued plant over a binary erasure channel. For this example, 
stabilization is impossible unless different bits are treated differently when it comes to transporting them across the 
noisy channel. These results establish that in interactive settings, a single "application" can fundamentally require 
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Fig. 1. Control over a noisy communication channel. The unstable system is persistently disturbed by Wt and must be kept stable in 
closed-loop through the actions of 0,C. 
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different senses of reliability for its message streams. No single number can adequately summarize the channel and 
any layered architecture for reliable communication should allow applications to individually adjust the reliabilities 
on message streams. Recently, Pradhan has investigated block-coding reliability regions for distributed channel 
coding without feedback [8], [9]. This correspondence shows that reliability regions are interesting even in the 
point-to-point case in that they are both useful and nontrivial. 

Section [VT] illustrates by example both the implications of the results as well as how to generalize the scalar 
results to vector systems with diagonalizable dynamics. The remaining ideas involved in proving the key results 
are given in Section I VII I for sufficiency and in Section I VIII I for necessity. Many aspects of the results here 
are straightforward generalizations from [1] using standard linear control theory tools. To avoid unnecessarily 
lengthening this correspondence, the details of these straightforward aspects are omitted. The reader familiar with 
[1], [10] should not have any difficulty in filling in the omitted details. 

II. The model and main results 

The model is the same as in [1] except that everything is vector-valued. It is depicted in FigureQ] For convenience, 
all parts operate in discrete time with a common clock for stepping through time t. 
The n-dimensional state of the control system at time t is denoted X t and evolves by 

X t+1 = AX t + Bjj t + B w W t , t>0. (1) 

To be interesting, the matrix A should have some unstable eigenvalues that lie strictly outside the unit circle. For the 
initial condition, depending on the context we assume either a known zero initial condition Xq = or a bounded 
initial condition \\X t \\ < 

Any convenient finite-dimensional norm can be used since they are all equivalent. Consequently, this correspon- 
dence mostly assumes the oo— norm \\X\\ = maxj |X(i)| for convenience. Throughout, subscripts are used to denote 
time indices, and the z-th component of of the vectors is selected using X(i). 

The noisy channel is a probabilistic system with an input and an output. At every time step t, it takes an input 
a t G A and produces an output zt G Z with probability p(Zt = zt\a\^z\~ l ) where the notation a\ is shorthand 
for the sequence a±, a^, ■ ■ ■ , at. In general, the current channel output is allowed to depend on all inputs so far as 
well as on past outputs. 

The m y dimensional input Yt to the observer/encoder O is a linear function of the state corrupted by bounded 
additive noise. 

Y t = C y X t + N t . (2) 

The observer maps Ot : (iR m ")* — ► A take the observations Y-y and emit a channel input a t . 

The channel outputs Z\ enter the controller maps Ct : Z l — > JR mu and result in the m u -dimensional control 
signal U t .^ 

The {Wt} is a bounded noise/disturbance sequence taking values in IR mw s.t. \\Wt\\ < Similarly, the 
observation noise {Nt} is only assumed to be bounded so that \\N t \\ < ^. 

As in [1], the results in this correspondence impose no restrictions on the individual sequences {w t }, {n t } 
other than remaining bounded by Q and T. In particular, no distribution is assumed for these disturbances. All 
distributions with bounded support are already covered by the sufficiency result here while the techniques of [11] 
can be applied to generalize the necessity result with suitable technical conditions. 

All the randomness comes from the noisy channel and any randomization performed within the observer/encoder 
and controller/decoder. To be precise, the underlying sample space ^ sam p[ e = ^channel x ^code- ^ ne channel's 
randomness and the randomness available to the encoder/decoder are assumed to be independent of each other and 
this is reflected in the underlying sigma field T and probability map V. However, once all the boxes in Figure [T] are 
connected together and the individual sequences {n t },{wi} are specified, the {Ut, X t ,Yt} become a well defined 
joint random process on the underlying probability space. 

Definition III: (Parallels Definition 2.2 in [1]) A closed-loop dynamic system with state X t is rj-stable if there 
exists a constant K s.t. for every {n t },{wi} satisfying their bounds, the expectation S[||X t ||' ? ] < K for all t > 0. 

'This is a probability mass function in the case of discrete alphabets Z, but is more generally an appropriate probability measure over the 
output alphabet Z. 
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The expectation is taken over all the randomness in the system, including the noisy channel and any randomness 
that the controller and observer can access. The constant K may depend on the parameters of the system including 
the constants Q, T, but the bound on the 77-th moment must hold for all times t and uniformly over all individual 
sequences for both the driving disturbance and observation noise. 

The equivalence of finite-dimensional norms guarantees that a bounded 77-moment of the 00— norm of X implies 
a bounded 77-moment of any other norm and vice versa. This also implies that if the 77-moment is bounded in one 
coordinate system it is also bounded in another coordinate system. Thus we will choose the coordinate system best 
matched to the system dynamics. 

The goal is to design observers Ot and controllers Ct that 77-stabilize the system. 

A. Dimensionality mismatches and intrinsic delay 

Unlike the scalar case, the dimensions of X, U, W, Y can all be different. So even without a communication 
constraint, stabilizing the system in closed-loop requires (A,C y ) to be an observable pair. A pair of matrices 
(A, C) is observable if the matrix [C, CA, CA 2 , . . . , CA n ~ l ] T is of full rank [10]. This condition assures that 
by combining enough raw observations, all the modes of the linear dynamical system can be observed. The 
corresponding conditions on (A, B u ) and (A, B w ) is that they be reachable pairs. A pair of matrices (A, B) is 
reachable if the matrix [B, AB, A 2 B, . . . , A n ~ 1 B] is of full rank [10]. This condition assures that by appropriate 
choice of inputs, all the modes of the linear dynamical system can be driven to a desired state. 

Definition IV: The intrinsic delay @(A, B u ,C y ) of a linear system is the amount of time it takes the input to 
become visible at the output. It is the minimum integer i > for which C y A L B u 7^ 0. 

For single-input single-output (SISO) systems, this is just the position of the first nonzero entry in the impulse 
response. 

A. Anytime capacity regions 

Anytime capacity is introduced in [1] and related to traditional channel-coding reliability functions in [12]. In 
order to state the results for systems with vector-valued state, it is convenient to introduce the notion of an anytime 
rate region. Throughout this correspondence, rates R are measured in units of bits per time step. 

Definition V: A rate-tuple (Ri, R2, . . .) sequential communication system over a noisy channel is a channel 
encoder £ and channel decoder V pair such that: 

• Messages enter the encoder at time t. Message = S\ "j^l^.ni corresponds to the t-th Ri-bit 
message sent in the i-th message stream and M\\ d is shorthand for the sequence (M^i, M^, • • • , Mj ,t-d)- At 
the bit level, the j-th bit Sij arrives at encoder i at time 

• The encoders £ t '■ Z l ~ e x {0, l}Sr=iL s i*J with delay-# feedback have access to past channel outputs 
Z l ^ e in addition to the message bits, and produce a channel input at times t based on everything it has seen 
so far. 

• The decoder T> t : Z l — > {0, lJ-^JLiL^**] produces updated estimates Mjj (t) for all j < t based on all channel 
outputs observed till time t. 

The anytime rate region 7^any(<5) of a channel is the set of rate-tuples R2, ■ ■ ■) that the channel can support 
using sequential communication. There has to exist a uniform constant K so that for each i, all delays d, all times 
t, and all possible message sequences {Mjj}, 

V (Mlj d {t) / M^ d (£)) < K2~ a > d . 

No distribution is assumed for the bits Si j and so this is essentially a requirement on the maximum probability 
of error. If common randomness is allowed, this is equivalent to assuming a uniform distribution over the bits and 
an average probability of error since the common randomness can be used to make the input look uniform by 
XORing each message bit with an iid fair coin toss known to both encoder and decoder. 

The 9-feedback anytime rate region refers to the rate region when noiseless channel output feedback is available 
to the encoder £ with a delay of 9 time units. If 6 is omitted, it is to be understood as being one. 
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This generalizes the notion of a single feedback anytime capacity Cany (a) to a rate region 7^-any (a) corresponding 
to a vector a of anytime-reliabilities specifying how fast the probabilities of error tend to zero with delay for the 
different message streams. 

Because all the message streams could simply be multiplexed together into a single stream in which everyone 
has the same reliability, it is obvious that 7£any(<S) must contain the convex region defined by 

Ri > 0, 

y^^Rj < Cany (max aj). (3) 

3 3 

Similarly, if R! £ TZany(o), then any rate vector obtained by stealing rate from higher reliability message streams 
and distributing it among lower reliability streams is also going to be within the rate region. 

To get a simple outer bound, just notice that any single stream could be demultiplexed into parallel message 
streams and thereby achieve the anytime reliability of at least the minimum of the parallel anytime reliabilities. For 
convenience, assume that a is sorted so that a\ > a 2 > ■ • ■ > «n- This means that lZany{a) must be contained 
within the region defined by the intersection of the following regions: 

Ri > 

k 

Y, R i ^ <?any(« fc ) ( 4 ) 
as i, k range over all the message stream indices. 

A. Main results 

Since the vector A of unstable eigenvalues (if an eigenvalue has multiplicity, then it should appear in A multiple 
times) plays an important role in these results, some shorthand notation is useful. An is used to denote the component- 
wise magnitudes of the A. log 2 (A||) is used to denote the component-wise logarithms of those magnitudes. 

Theorem 5.1: For some e > 0, assume a noisy finite-output-alphabet channel such that the (6 + 1) -feedback 
anytime rate region T^any (77 log 2 A 1 1 + e) contains the rate vector (log 2 (Aii) + e). Also assume an unknown initial 
condition Xq that is bounded \\Xt\\ < 

If the observer has access to the observations Y t corrupted by bounded additive noise, then any linear system 
with dynamics described by (Q]) with unstable eigenvalues A, reachable (A, B u ), observable (A, C y ), intrinsic delay 
&(A, B u , C y ) < 9 can be ^-stabilized by constructing an observer O and controller C for the unstable vector system 
that together achieve < K for all sequences of bounded driving noise \\W t \\ < § and all sequences of 

bounded observation noise \\Nt\\ < 

If the observer maps Ot are also allowed direct access to the past channel outputs with delay 6' > 1, then it 
suffices to just consider the min(0', 9 + l)-feedback anytime rate region. 

By applying ([3]) to Theorem 15. II one immediately gets the following easier to check corollary: 
Corollary 5.1: If the sum of the logarithms of the magnitudes of the unstable eigenvalues of a system matching 
the conditions of Theorem 15. II is less than the {9 + l)-delayed feedback anytime capacity Cany {v maxj log 2 |Aj|), 
then it is possible to 77-stabilize the system over the noisy channel in the same sense as in Theorem 15.11 

For the necessity direction, the requirements are relaxed and we only assume that a control system exists that 
works with a known initial condition and without noise in the observations since these make the task of the 
observer and controller easier. 

Theorem 5.2: Assume that for a given noisy channel, system dynamics described by CQ) with reachable (A, By/), 
observable (A,C y ), zero initial condition Xq = 0, eigenvalues A and rj > 0, that there exists an observer O and 
controller C for the unstable vector system that achieves S[||X t || ,? ] < K for all sequences of bounded driving noise 
|| < § and all t. 

Let I Aj| > 1 for i = 1 ... I, and let A be the /-dimensional vector consisting of only the exponentially unstable 
eigenvalues of A. Then for every el, e 2 > the rate vector (log 2 Am — el) is contained within the (Q(A, B u , C y ) + 1)- 
feedback anytime rate region T^any (?? log 2 A|| — e 2 ) for this same noisy channel. 
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This theorem reveals that each unstable eigenvalue, no matter whether it has its own eigenvector or not, induces 
a demand that the channel be able to reliably transport a message stream. The sufficient conditions of Theorem 15.11 
and the necessary conditions of Theorem 15.21 match each other dbe! 



VI. Differentiated Service Example 

This section studies a simple numeric example of a vector valued unstable plant. An explicit self-contained five- 
dimensional example was given in [13] for the binary erasure channel, but here a simpler two-dimensional example 
is given that leverages the results from [12]. 



A 



2 0.34 

2°- 05 



(5) 



where the observer has noiseless access to both the state X and the applied control signals U. The controller can 
apply any 2-dimensional input that it wishes. Assume that the disturbance W satisfies ||Wt||oo < \ for all times t. 
Thus, this example consists of two independent scalar systems that must share a single communication channel. 

Section lVI-Al reviews how it is possible to hold this system's state within a finite box over a noiseless channel using 
total rate R = 0.392 consisting of one bitstream at rate 0.341 and another bitstream of rate 0.051. Section IVI-BI 
considers a particular binary erasure channel and shows that if it is used without distinguishing between the 
bitstreams, then third-moment stability cannot be achieved. Section IVI-CI shows how a simple priority based 
system can distinguish between the bitstreams and achieve third-moment stability while essentially using the 
observer/controller originally designed for the noiseless link. Finally, Section IVI-DI discusses how this diagonal 
example can be transformed into a single-input single-output control problem that suffers from the same limitations. 



A. Design for a noiseless channel 

The system defined by ((5]) consists of two independent scalar systems and so each of these falls under [1]. Since 

0.341 > 0.34 
0.051 > 0.05, 

Theorem 4.1 in [1] guarantees that it is sufficient to use two parallel bitstreams of rates R\ = 0.341 (for the first 
subsystem) and R2 = 0.051 (for the second subsystem) to stabilize the system over a noiseless channel. The total 
rate is 0.392 bits per channel use. 



B. Treating all bits alike 

A strict layering-oriented design attempts to use a virtual bit-pipe interface to connect the observers and controllers 
from the previous section. Consider a binary erasure channel with erasure probability f3 = 0.4 and noiseless feedback 
available to the encoder. There is clearly enough Shannon capacity since 1 — 0.4 = 0.6 > 0.392. To minimize 
latency, the natural choice of coding scheme is a single FIFO queue in which bits are retransmitted until they get 
through correctly. The system is illustrated in Figure [2] 

If a channel-code does not differentiate among the substreams, then the code would have to give the same anytime 
reliability to all the bits. The minimum anytime reliability required is a* = 31og 2 2 34 = 1.02. For the binary 
erasure channel, there is exact expression for the feedback anytime-capacity: [12, Theorem 3.3] 

(X 

Cany (a) = — — x _ p ■ (6) 

Plugging in j3 = 0.4 and a = 1.02 into © reveals that the channel can only carry « 0.38 < 0.39 bits/channel-use 
with the required reliability. Thus, it is impossible to simultaneously attain the required rate/reliability pair by using 
a channel code that treats all message bits alike. 
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Fig. 2. Forcing all the bitstreams to get the same treatment for reliable communication 



C. Differentiated service 

The main difficulty encountered in the previous section is that the most challenging reliability requirement comes 
from the larger eigenvalue, while the total rate requirement involves both the eigenvalues. This section explores 
the idea of differentiated service at the reliable communication layer as illustrated in the simple priority-based 
scheme of Figure [3] This is used to give extra reliability (shorter delays) to the bitstream corresponding to the first 
subsystem at the expense of lower reliability (higher delays) for the second one. 

• Place bits from the different streams into prioritized FIFO buffers. 

• At every channel use, transmit the oldest bit from the highest priority input buffer that is not empty. 

• If the bit is received correctly, remove it from the appropriate input buffer. 

• If there are no bits waiting in any buffer, then send a dummy bit across the channel. 

Two priority levels are used. The higher one corresponds to the rate R± = 0.341 bitstream coming from the first 
subsystem with eigenvalue 2 0,34 . The lower one corresponds to the rate R2 = 0.051 bitstream and corresponds to 
the subsystem with eigenvalue 2 05 . 

The decoder functions on a stream-by-stream basis. Since there is noiseless feedback and the encoder's incoming 
bitstreams are deterministic in their timing, the decoder can keep track of the encoder's buffer sizes. As a result, it 
knows which incoming bit belongs to which stream and can pass the received bit on to the appropriate subsystem's 
controller. The sub-system controllers are patched as in the proof of Theorem 4.1 in [1] — they apply XfUi(t ) if 
their bit arrives with a delay of d time-steps instead of showing up at time t a as expected. 

All that remains is to calculate a lower bound on the anytime reliabilities delivered by such a communication 
scheme. 

Theorem 6.1: For the binary erasure channel with erasure probability f3 > used with the strict two-priority 
encoder above, high-priority rate < Rh < I — (3 and low -priority rate < Rl < 1 — (3 — Rh, the system attains 
anytime-reliabilities an , ct 1 satisfying 

olh = Cany(-Rtf) 

a L > max E (p) - pR H (7) 

P<Phl 

where 

Eo{p) = -log 2 (/3 + 2-' , (l-/3)) (8) 
is the base-2 Gallager function for the BEC from [14] and phl is the unique solution to 

R H + R L = ^^. (9) 

PHL 
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Fig. 3. The strict priority queuing strategy for discrimination between bitstreams. Lower priority buffers are served only if the higher 
priority ones are empty. 



When Rl is low enough, the bound (|7]) evaluates to the sphere -packing bound D(l — Rh\\/3) for the anytime 
reliability of the lower-priority bitstream. 
Proof: See Appendix J] 

Numerical evaluation of the bound (O for the rate -pair Rh = R\ = 0.341, Rl = R2 = 0.051 gives the anytime 
reliabilities ot\ as 1.11 > 1.02 and 02 > 0.196 > 0.15. This reveals that the both subsystems will remain stable in 
the third-moment sense. 

All of this is illustrated graphically in Figure [4] The diagonal lines have a slope of r\ = 3. The marked x on 
the plot has rate equal to 0.34 + 0.05 and a reliability of 3 * 0.34. Since it is outside the anytime capacity region 
demarcated by the BEC's uncertainty-focusing bound, it is not achievable. The two marked o points represent the 
reliabilities achieved by the high and low priority streams. Notice that both are above their corresponding diagonal 
lines and so the resulting closed-loop system is 3-stable. 



D. Interpreting the example 

The diagonal system example given here is subject to two simple interpretations. First, it can be interpreted as two 
physically distinct control systems that must share a common bottleneck communication link. Thus, it represents 
an information-theoretic example of how different interactive applications sharing the same communication link 
can require differentiated service by the reliable communication layer even in the context of an asymptotic binary 
performance objective like ^-stabilization. 

Alternatively, this example can be packaged into a single system with a vector valued state. This vector-state 
valued system can even be at the heart of a SISO control system. Consider the change of coordinates matrix: 



1 -1 
1 

This transformation is used to define A = TAT -1 using (fTOl ) and © results in 

20.34 2 - 05 — 2 - 34 



.4 



The B w matrix remains the identity while 







>0.05 
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and 



Cy= [1,1] 



(10) 



(11) 



(12) 



(13) 



This is clearly an unstable SISO system with a scalar observation Y t and scalar control Ut- As a scalar system, it 
has two real poles at 2° and 2 05 . Both are outside the unit circle. 
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Fig. 4. The anytime capacity curve for the binary erasure channel, along with the sphere-packing bound and an evaluation of the low-priority 
bound for Rh = 0.341. The parabolic curve under the low-priority bound illustrates what happens as p ranges in Q. The maximum of the 
curve is attained at the sphere-packing bound. 



It is easy to verify that C y and C y A are linearly independent and thus the system is observable. If there were 
neither controls nor driving disturbances, then an observer O could take an appropriate linear combination of two 



(2" 



-2 105 )y i +y, H 



and 



consecutive scalar observations Y^Y^i to recover the state X; exactly. Explicitly, Xj(l) 

x .e>\ - 2° 34 y,-r, + i 

■^-ly^l — 2 1 * 34 — 2 1 05 " 

The driving disturbance shows up as an additional "noise" in Xi+i. Hence the effective observation noise in Yi is 
Ni and the effective observation noise in Yi+i is Wj(l) + Wj(2) + Ni+i. So the estimation error |Xj(l) — X(l)| < 

(2 ,o 5 _ 2 o, 4+1)r+2n andsimilarly |^. (2) _ X . (2) | < (2°^+l)r + 2fl 



2(2 1 - 34 -2 1 - 05 ) 

the norm of the effective observation noise. 



2(2! 



5 ) 



This can be interpreted as a larger V that bounds 



To support the estimation performed at the observer, the controller could either refrain from applying any controls 
for two consecutive time-instants or equivalently apply control signals tiiat are perfectly known to the observer. 

Similarly, the controllability conditions are satisfied since B u and AB U are linearly independent. By identical 
reasoning, this means that the controller can apply any desired control to each of the underlying states by preparing 
scalar controls in batches of two consecutive time units. Thus, the controller can alternate between applying a zero 
control for two time units and then applying a batched control for the next two time units. 

Stabilizing the output of the SISO system clearly requires stabilizing all of the internal states since the system 
is observable. Since the internal state evolution of this system is governed by A, it is essentially the same as 
that governed by the diagonal A since the two differ only by a linear change of coordinates. This means that 
differentiated service across the erasure channel is required for this single SISO system as well! 



VII. Sufficiency: Proof of Theorem |5.1| 

Theorem |5.1| is proven in stages. The scalar case is in [l] and as Section lVI-Dl shows, the scalar results immediately 
generalize to systems with purely diagonal dynamics through a change of coordinates. The bounded initial condition 
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can be interpreted as a zero initial condition and bounded driving noise, but for a system that starts at time — 1. 
Consequently, the first new issue concerns systems with nondiagonal Jordan blocks. After that, we consider general 
systems that are reachable and observable, but where the anytime code is assumed to be a black-box with its own 
access to channel feedback. Finally, we show how to operate a feedback anytime code without any explicit feedback 
path for the channel outputs. 



+ 3 



A. Non-diagonal Jordan blocks. 

Proposition 7.1: For some e > 0, assume access to an anytime-code that supports the rate vector (log 2 (Aii 
with anytime reliabilities rj log 2 Ai i + e. 

Consider an n-dimensional linear system with dynamics described by £[]) having positive real unstable eigenvalues 
A, A in block-diagonal form with each block being upper-triangular and having a single real- valued eigenvalue |A| 
on its diagonal, B u and C y as identity matrices so that each state can be individually controlled, with observations 
Y t corrupted by bounded additive noise. 

Then for all f2 > 0, T > 0, there exists a K > so that the system can be ^-stabilized by constructing an 
observer O and controller C for the unstable vector system that together achieve £'[||X t || r '] < K for all sequences 
of bounded driving noise || Wi]| < § and all sequences of bounded observation noise ||iVt|| < 



2 • 



Furthermore, this continues to hold even if the controller is restricted to applying a nonzero control signal only 
every n time steps and the observer is similarly restricted to sample the state every n time steps. 
Proof: Because the blocks corresponding to different eigenvalues do not interact with each other in the specified 
model, it suffices to consider an n-dimensional square A matrix that represents a single upper-triangular block. 
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(14) 



There are two key observations. The first is that the dynamics for the last component X t (n) are the same as in 
the scalar case — X t +i{n) = XX t (n) + Wt(n) + Ut(n). This faces a driving disturbance with bound Q n = Q. 
The second is that the dynamics for all the other components are given by: 



X t+1 (i) = \X t {i) + U t (i) + W t {i) + 



II 

j=i+l 



0; 



Mi) 



(15) 



Recall that the constructions in Section IV.B of [1] (duplicated here in Appendix ITTT1 for reader convenience) are 
based on having a virtual controlled process X that is stabilized over a finite-rate noiseless channel in a manner 
that keeps the virtual state within a A-sized box, no matter what the disturbances are. The observer essentially tells 
the controller what controls to apply so as to do this and protects these instructions with an anytime channel code. 

Group together the weighted sum of the bounded virtual controlled state dimensions J2j=i+i a i,jX t (j) and the 
net disturbance Wt(i)) into a single disturbance term. The new bound on the disturbance is simply 



tti = n + 



E 

j=i+i 



where Aj are computed recursively using the following formula from [1]: 



A, 



1 - A2~ R 
where R > log 2 A. 

Since there are only a finite number of state dimensions, this shows that "in the box" stabilization is possible 
using noiseless channels at the appropriate rates. Just as in [1], if used with an anytime code, the control signal 
must take care to counteract the impact of any previously erroneous control signals. Let Xt represent the state at 
time t that would result from only the actual controls applied (no disturbances) till time t — 1. AX t is the prediction 



10 



for what that would evolve into if a zero control were to be applied at time t. The current message estimates from 
the anytime code at time t reveal what the desired value for Xt+i is. As B u is the identity, the actual applied 
control signal is just the difference X t +± — AX t . 

The only remaining question concerns the impact on state j of such temporary anytime decoding errors on 
message stream i > j. Recall that the "impulse response" on state j of an impulse on state i at time is given by 
Pij(t)A* where p(t) is some polynomial in t of order i — j where the polynomial depends on the elements of the 
A matrix. The polynomial is bounded above by K{\ e ) t where if is a constant depending on A, n, A, e and e > 
can be chosen as small as desired. K can further be multiplied by the (small) constant n to bound the net impact 
of an error on state j from a decoding error in any combination of message streams i > j. 

Since the message streams have anytime reliability a, individually, they have anytime reliability minj a?j when 
considered together as a single stream. Since the relevant anytime reliability minj aj > 7/ log 2 A, we can choose e 
so that OLi > 7/(1 + e) log 2 A as well. Thus, by the same arguments as Section IV.D of [1], all the 7/-moments in the 
block will be bounded. 

Notice that A n is also upper-triangular with diagonal terms of A n . Thus, by arguments identical to those of 
Theorem 4.4 in [1], the results continue to hold if both the observer and controller are restricted to act only every 
n time steps. □ 



B. Changing coordinates: complex unstable eigenvalues 

The restriction to real block-diagonal upper-triangular systems in Proposition 17.11 is easily overcome by choosing 
the right coordinate frame. 

Proposition 7.2: Proposition 17. 11 holds even if the real A matrix has complex unstable eigenvalues A. 
Proof: To avoid any complications arising from the complex eigenvalues, the real Jordan normal form can be used 
[15]. This guarantees that there exists a nonsingular real matrix V so that VAV" 1 is a diagonal sum of either 
traditional real-valued Jordan blocks corresponding to the real eigenvalues and special real-valued "rotating" Jordan 



blocks corresponding to each pair of complex-conjugate eigenvalues. The rotating block for the pair A 
and its conjugate is a real two-by-two matrix: 



\ r -\- \j 



A r 


Aj " 






"A, 


A?* 






|A| 







cos(ZA) 


sin(ZA) 





|A| . 




- sin(ZA) 


cos(ZA) 



which is clearly a product of a scaling matrix and a rotation matrix. Group these two-by-two rotating blocks into a 
block-diagonal unitary matrix R. So VAV -1 = AR where A is now a real block-diagonal matrix whose constituent 
blocks are upper-triangular and whose diagonals consist of the magnitudes of the eigenvalues. 

The key is to take the rotating parts and view them through the rotating coordinate frame that makes the system 
dynamics real and block-diagonal. Transform to X' kn = {R~ kn V)X kn using R~ kn V as the real time-varying 
coordinate transformation. Notice that 



= R-^VX k+1 
= R-( k+1) VAX k + --- 
= R-( k+1) VAV~ 1 R k X k + --- 
= R-^ k+ ^KRR k X' k + --- 
= R-^R k+1 AX k + --- 
= AA^ + --- 

since the A block-diagonal matrix commutes with the unitary block-diagonal matrix R. The time-varying nature of 
the transformation is due to taking powers of a unitary matrix R and so the Euclidean norm is not time-varying. 
The problem in transformed coordinates falls under Proposition 17.11 and so can be 77-stabilized. □ 
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C. Dimensionality mismatch 

The restriction to B u and C y consisting of identity matrices so that each state dimension can be individually 
controlled and observed is also easily overcome: 

Proposition 7.3: For some e > 0, assume access to an any time-code that supports the rate vector (log 2 (Aii) + e) 
with anytime reliabilities rj log 2 Ai i + e. 

Consider an n-dimensional linear system with dynamics described by £T|) with a real matrix A with unstable 
eigenvalues A, (A, B u ) reachable, (A,C y ) observable, with observations Y t corrupted by bounded additive noise. 

Assume that the observer O has access to the applied control signals. Then for all f2 > 0, Y > 0, there exists 
a K > so that the system can be ^-stabilized by constructing an observer O and controller C for the unstable 
vector system that together achieve £ , [||Xt|| T? ] < K for all sequences of bounded driving noise 1 1 W* 1 1 < § and all 
sequences of bounded observation noise ||iV t || < ^. 

Proof: First consider the system as though it has a B u and C y consisting of identity matrices so that each state can 
be individually controlled. Proposition 17.21 tells us that for every Q',F' there exists an observer O' and controller 
C that only interact with the system every n time steps and can r/-stabilize the system. 

By the observability of (A, C y ), it is known that there exists a linear map F so that n successive measurements 
Y t+ i,Y t+ 2, ■ ■ ■ , Y t+n of the system suffice to recover the final state Xt+n = F(Y t+ i,Y t+ 2, ■ ■ ■ , Y t +n) if there were 
no driving disturbance W, observation noise N or control signals U. Since all the control signals are presumed 
to be known exactly at the observer, linearity tells us that their impact on the state Xt+n can be compensated for 
exactly. Thus, only the effect of at most n of the bounded W, N remains. So there exists a T' > such that the 
observer has access to a T'-boundedly noisy observation of the true state X t every n time steps. This is used to 
construct an observer O from O' . 

Similarly, by the controllability of (A,B U ), it is known that there exists a sequence of linear maps G so that 
by applying controls U t = G\(U'), Ut+i = GziU'), . . . , Ut+ n -i = G n {U') in n successive time-steps, the system 
behaves as though a single control U' was applied to the system that had a B u = I so that all states were 
immediately reachable. This is used to construct a controller C from C'. 

The desired proposition follows directly. □. 

D. Communicating through a plant with delay 

We are now in a position to prove Theorem 15.11 by building upon Proposition 17.31 

Proof: Consider the assumed (9 + 1) -feedback anytime code. It is clear that simply delaying the outputs of 
the anytime decoder by a constant r time-steps does not change either the message rates or the attained anytime 
reliabilities. The probability of message error merely gets worse by at most a factor of 2 QiT on stream i. Set 

T = 9 + l. 

Applying Proposition 17.31 gives an observer O' and controller C satisfying the following properties: 

• The closed-loop system is Testable. 

• The control signal U' t only depends on the channel outputs Z* _T . 

• The observer O' requires access to the past channel outputs Z t _g_i to operate the anytime code. 

• The observer O' requires access to the past control signals for its own operation. 

It suffices to give the observer access to the past channel outputs Z t -o-\ since that way, it can compute its own 
copy of the control signals. If the observer has direct access to past channel outputs, then we are done. Otherwise, 
the channel outputs must be communicated back to the observer through the vector-plant using only 9 + 1 time 
steps by making the plant "dance" with that delay following Section V.B.2 in [1]. The boundedness of both the 
disturbance and the observation noise means that there is a zero-error path to communicate through the plant itself. 

The key idea is illustrated in Figure [5] Without loss of generality, assume that C y A®B u has a nonzero element 
in its first column. Let ip be the response of the system at time B (the intrinsic delay through the plant) when 
fed an input of the (1, 0, ... , 0) T vector at time 1. Let ip be the maximum of \ip2\, ■ ■ ■ , |^m|- Let T" be the 
maximum magnitude of the effective observation noise at the receiver after accounting for the combined bounded 
uncertainties in both the true observation noise as well as the driving disturbances. 

Associate the finite channel output alphabet Z with the positive integers 1,2,..., \Z\. Add 3^Z t to the first 

dimension of the control signal U[ before applying it. In time steps, the response will show up at the observer 
as a shift in Y^ + o + i that is unmistakably decodable to recover Z f _e_i exactly. 
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Applying next set of controls 
Applying control U n 



Controller timeline ■ 



rnr 



TTTlfTTT 



While applying the controls given to 
the left, the controller also applies 
a modulation that communicates 
the current discrete channel output back 
to the observer in 1 + © time units 



Commit to next set of controls 



Commit to a control sequence U" 
to begin applying in the following n timeslots 
that stabilizes the system and counteracts 
the effect of all prior modulations 



Observer timeline ■ 



Use knowledge of controls U n 
and modulations to interpret 
observations Y n to estimate X 



Know next set of controls 



Knows the control sequence U n 
the controller is applying since 
it knows all the channel outputs 
it is based upon 



Fig. 5. When viewing time in blocks of n, the controller is required to commit to its primary controls 1 time step before actually putting 
them into effect. This way, by the time the observer can first see the effect of these controls, it already knows exactly what that effect is 
going to be since it knows all the channel outputs that the controls were based upon. 



As in Section V.B.2 of [1], the controller uses the controllability of (A,B U ) to superimpose another control 
input (in blocks of n time-steps) whose purpose is to prevent the past communication-oriented controls "i~Z t 
from continuing to propagate unstably through the system dynamics. Since this is only a function of past channel 
channel outputs, its effect can also be removed from the observations at the observer. 

Since the additional communication-oriented control signals only have an impact that lasts for at most 2n time 
steps, the ^-stability of the closed-loop system is unchanged and the theorem is proved. □ 

VIII. Necessity: Proof of Theorem I5.2I 

The extension of the scalar-case Theorem 3.3 in [1] to the vector case is largely straightforward and for the 
most part, the same arguments that worked in Section IVIII apply on the necessity side — with the controllability 
of (A,B W ) playing the same role here that controllability of (A,B U ) did in Section IVlIl Observability is not an 
issue since the goal is to simulate an unstable system driven by bounded disturbances by using the message bits 
as well as the © + 1 delayed channel outputs. The embedding is such that the uncontrolled process (without the U 
controls) grows exponentially with time and has high-order bits representing message bits from a long time ago. 
Since the controlled process is the sum of the uncontrolled process and the undisturbed process (without the W 
disturbances), the size of \\X t \\ captures the extent to which the controller knows the embedded message bits. 

Controllability of (A,B W ) can be used to apply any desired input sequence to the individual eigenstates, at the 
expense of a smaller bound f2 since the original disturbance constraint might turn into something smaller after 
passing through the linear mapping induced by the reachability Grammian [B, AB W , A 2 B W , . . . , A n B w ]. This 
leaves only two non-obvious issues: 

• Dealing with channel feedback that is delayed by © + 1 time-steps. 

• Dealing with non-diagonal Jordan blocks. 

Otherwise, the problem reduces by a change of coordinates to parallel scalar systems and Theorem 3.3 in [1] 
gives the desired result. To avoid repeating the same arguments as the previous section and [1], we focus here only 
on the new issues. 
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A. Using delayed feedback to simulate the plant 

The key idea is that we do not need to feed the simulated plant state X to the observer O, just the simulated 
plant observation. In order to generate the simulated Y t+ i, the exact U values are only needed through time 
t — Q(A, B u ,C y ) since controls after that point have not become visible yet at the plant output. In running the 
simulated control system at the anytime encoder, a delay of 1 + @(A, B u , C y ) can therefore be tolerated rather than 
the unit delay assumed while proving Theorem 3.3 in [1]. 



B. Non-diagonal Jordan blocks 

It suffices to consider a single real upper-triangular block (fl4l since the real Jordan form decouples a general 
vector problem into such components by a rotating change of coordinates. 

The n parallel bitstreams are encoded independently at rates log 2 \>R = Ri = R2 = -- - = R n into 
the simulated individual driving disturbances Wi(t) using the simulator given by equation (6) in the proof of 
Theorem 3.3 in [1]. The new challenge arises at the decoder. 

Notice that the last state X t (n) is just like the scalar case and only depends on its own bitstream. However, all 
the other states have a mixture of bitstreams inside of them since the later states enter as interfering inputs into 
the earlier states. As a result, the decoding algorithm given in Section III.B.2 of [1] will not work on those other 
states without modification. 

The decoding strategy in the upper-triangular case changes to be successive-decoding in the style of decoding for 
the stronger user in a degraded broadcast channel [16]. Explicitly, the decoding procedure is as follows for every 
given time t at the decoder: 

1) Set i = n. Set D t (j) = —X t (j) for all j where X(j) represents the j-th component of the system in 
transformed coordinates driven only by the control inputs U', not the disturbances W'. This is what is 
available at the decoder. 

2) Decode the bits on the ith stream using the algorithm of Section III.B.2 of [1] applied to D t (i). 

3) Subtract the impact of these decoded bits from D t (k) for every k < i. 

4) Decrement i and goto step 2. 

Notice that if all the bits decoded upto a point are correct, then when decoding the bits on the ith stream (using 
Dt(k) as the input to the bit-extraction algorithm of Section III.B.2 of [1]), the D t (k) will contain exactly what it 
would have contained had the A matrix been diagonal. Consequently, the error probability calculations done in [1] 
would apply. However, this successive decoding strategy has the possibility of propagating errors between streams 
and so the error propagation must be accounted for. 

The goal of the €2 is to allow a slightly lower sense of reliability for the early streams within a block. Equation 
(11) in [1] tells how much of a deviation in D t (i) can be tolerated without an error in decoding bits from before 
d time steps ago. Repeated here: 

gap t(i )= inf IMS)- 1,(5)1 »i*L»J 
S-.S^Si [ otherwise 

where 7 = — j^t-, e i = %~ ~ — 2 are constants defined in [1] that depend on the message rate R and the size fl 
allowed while simulating the driving disturbances W. 

To get an upper bound on the probability of error, allocate half of that maximum deviation gap t (i) into n — i + 1 
equally-sized pieces. So each allocated margin is at most of size ( n _ i+ 7 1 e ) 1 (i +£l ) 2 dlog2 A when considering a bit with 
delay d. The first n — i of them correspond to allowances for error propagation from later streams. The final piece 
corresponds to what is allowed from the controlled state at this level. For purposes of bounding, an error is declared 
whenever any one of these pieces exceeds its allocation. 

The following Lemma shows that error propagation can cause a total deviation only a little larger than X d on an 
exponential scale. 

Lemma 8.1: Consider a real Jordan block corresponding to A and time t. Suppose that there are only decoding 
errors in a stream i > j occurring for bits corresponding to times after t — d and there are no decoding errors on 
bits whose delays exceed d. 
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Then for every e' > 0, there exists a K' > so that the maximum magnitude deviation of Dj due to the decoding 
errors in stream i is bounded by K'2 d[ > l+t '^°^ A = K'\^ l+ ^ d . 
Proof: See Appendix JI] 

Using Lemma 18.11 and setting d! to the delay corresponding to the first bit-error in the other stream, the allocated 
margin can be set equal to the propagation allowance: 

7 e l odlogoA 



2<ilog 2 A _ ^/2<2'(l+e')log 2 A 



(n-i + 1)(1 + ei) 

7 £ 1 . 2 dlog 2 A = 2 d '( 1 + e ')log 2 A 



K'(n-i + l)(l+e 



lQ g2(^( TO _j+i)(i +£l )) 1 _ , 

(1 + 60106, A + 1 + 6' " 

K» + d^ = 

The key point to notice is that the tolerated delay d! on the other streams is a constant K" plus a term that is 
almost equal to d. 

Consequently, the probability of error on stream i for bits at delay d or more is upper-bounded by 

/ \ n 1 
V ( \X t (i)\ > -, 2 dlog 2 A ] + P(Stream ? has an error at position K" + d or earlier) 

V " (n-i + m + et) J 1 + e' 

Finite induction completes the proof. The base case, i = n is obvious since it is just the scalar case by itself. 
Now assume that for every j > i, 

"P (Stream j has an error at position d or earlier) < Kj 2 

With the induction hypothesis and base case in hand, consider i and use Markov's inequality since the 77-moment 
is bounded: 

"P(Stream % has an error at position d or earlier) 

n 

< p(|I t(i )|> ^ 2 ^lo g2 A )+ ^ ^ f 2^''+^T^ 

(n-z + l)(l + ei J 



V(\XS)\> ^i- ^ 2 dlo & A )+ V 2" d ^ 

(n-i + l)(l + ei ) ; ^ J 



l0S2 A 



?? log 2 A 



where we used the induction hypothesis, the proof of Theorem 3.3 in [1] and the fact that a finite sum of exponentials 
is bounded by a constant times the slowest exponential. Since e' was arbitrary and n is finite, this proves the theorem 
since we can get as close as we want to a = r] log 2 A in anytime reliability. □ 



IX. Conclusions 

Theorems 15.11 and 15 .21 reveal that the problem of stabilizing a linear vector plant over a noisy channel is intimately 
connected to the problem of reliable anytime communication of parallel message streams over a noisy channel with 
feedback. The anytime-capacity region of a channel with feedback is the key to understanding whether or not it is 
possible to stabilize an unstable linear system over that noisy channel. The two problems are related through three 
parameters. The primary role is played by the magnitudes of the unstable eigenvalues since their logs determine 
the required rates. The target moment rj multiplies these logs to give the required anytime reliabilities. Finally, 
the intrinsic delay @(A, B u ,C y ) tells us the noiseless feedback delay to use while evaluating the required anytime 
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reliabilities when explicit channel feedback is not available and all feedback must be implicitly through the system 
itself. 

To stabilize a system, it is sometimes necessary to treat some bits as being more time-sensitive than others. 
Though the example in Section [Vl] was crafted with the binary erasure channel in mind, we believe that similar 
examples should exist for most channels. However, there are also special channels for which such examples do not 
exist. In particular, the average -power constrained AWGN channel with noiseless feedback is special. As shown in 
[1], the AWGN channel has a feedback anytime capacity equal to its Shannon capacity regardless of a. The need 
for differentiated service can only exist when there is a nontrivial tradeoff between rate and reliability. 

Despite this, the ideas of this correspondence are significant even in the case of AWGN channels. They show 
that stabilization (of all moments) is possible over an adequate capacity AWGN channel with noiseless feedback 
even when there is a dimensionality mismatch between the channel and the plant. Prior results involving only linear 
control theoretic techniques could not reach the capacity bound for cases in which the dimension of the unstable 
plant was different than the dimension of the channel [6]. 

It should also be immediately clear that all the arguments given in [1] on continuous-time models also apply in the 
context of vector-valued states. Standard results on sampling linear systems tell us that in the continuous-time case, 
the role of the magnitude of the unstable eigenvalues is played by the positive real part of the unstable eigenvalues. 
Similarly, all the results regarding the almost-sure sense of stabilization when there is no persistent disturbance 
also carry over directly with no differentiated service required among the unstable eigenvalues. In addition, it is 
easy to extend the suboptimal but "nearly memoryless" simple random observer strategy of Theorem 5.2 of [1] 
to the vector context by randomly labeling a lattice-based quantization of n successive observations Y t . This is 
suboptimal because it treats all dimensions alike and also does not take advantage of the feedback to improve the 
anytime reliability of the channel. 

It should be noted that because the results given here apply for general state-space models, they also apply to all 
equivalent linear models. In particular, they apply to the case of control systems modeled using ARMA models or 
with rational open-loop transfer functions of any finite order. Assuming that there is no pole/zero cancellation, such 
results can be obtained using standard linear techniques establishing the equivalence of SISO models to the general 
state-space forms considered here. In those cases, the unstable eigenvalues of the state-space model correspond to 
the unstable poles (together with their multiplicities) of the ARMA model. The intrinsic delay corresponds to the 
number of leading zeros in the impulse response, i.e. the multiplicity of the zero at z = oo. 

The primary limitation of the results so far is that they only cover the binary question of whether the plant 
is stabilizable in the 77— moment sense or not. They do not address the issue of performance. In [11], we have a 
clean approach to performance for the related scalar estimation problem using rate-distortion techniques. The linear 
systems techniques of this correspondence apply directly to the estimation problem there and can generalize those 
results naturally to the vector case. In particular, it is straightforward, but somewhat cumbersome, to apply these 
techniques to completely solve all the nonstationary auto-regressive cases left open in [17]. 

For the estimation problem of [11] where the limit of large estimation delays does not inherently degrade 
performance, it turns out that I parallel bitstreams corresponding to each unstable eigenvalue are required, each of 
rate > log 2 | Aj | , together with one residual bitstream that is used to boost performance in the end-to-end distortion 
sense. The unstable streams all require anytime reliability in the sense of Theorem 15.21 while the residual stream 
just requires Shannon's traditional reliability. Since there are no control signals in the case of estimation, intrinsic 
delay plays no role there. 

A second limitation of the results so far is that there are no good inner or outer bounds on the anytime rate and 
reliability regions beyond the ones for the single-rate/reliability region [12]. However, even without such bounds, 
we have learned something nontrivial about the relative difficulty of different stabilization problems. For example, 
consider a scalar system with a single unstable eigenvalue of A = 8 as compared to a vector system with three 
unstable eigenvalues, all of which are Aj = 2. From a total rate perspective, the two appear identical requiring at 
least 3 bits per unit time. However, they can be distinguished based on the anytime -reliability they require. The 
scalar case requires anytime-reliability a > 3rj while the vector case can make do with any a > 7]. Since the three 
eigenvalues are identical in the vector case, there is also no need to prioritize any one of them over the others 
and thus we can interpret the "vector-advantage" as being a factor reduction in the anytime-reliability required. 
Thus, in the precise sense of Section VII of [1], vector-stabilization problems are easier than the scalar-stabilization 
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problem having the same rate requirement^ It seems that spreading the potential growth of the process across many 
independent dimensions reduces the reliability requirements demanded from the noisy channel. 

Appendix I 

Bounding the anytime reliability region of the strict priority queue 

The proof of Theorem 16.11 builds upon the proof of Theorem 3.3 in [12]. There, the anytime capacity of the binary 
erasure channel with noiseless instantaneous feedback is computed and shown to achieve the uncertainty-focusing 
bound which is given parametrically by 

a = E ( P ) , R=^-. (16) 
P 

Furthermore, it is shown in [12] that this reliability is attained by the strategy of placing the bits as they determin- 
istically arrive into a FIFO queue that is drained by 1 bit every time the BEC is successful. 

In such a code, there is a one-to-one mapping between the queue-length distribution and the delay distribution. 
Ignoring integer effects for the sake of notational convenience, the event that bit R(t — d) experiences a delay of 
larger than d is equivalent to the event that the queue contains at least Rd bits at time t. Since the marginal for 
delay has an exponential tail governed by the exponent a, this means that the steady-state queue-length Q has a 
tail governed by Mathematically, Ve > 0, 3K > so 

V(Q >q)< K2~^- e ^ (17) 

where a(R) is the delay-reliability attained at rate R as governed parametrically by (fT6*l) . Defining pj$ as the unique 
p that satisfies (fT6l ) immediately gives 

V(Q >q)< K2~^ R ~^ q . (18) 



A. The high priority stream 

Since the highest priority stream preempts the lower priority stream, it effectively does not have to share the 
channel at all. The queue-length is therefore the same as it would have been for a single bitstream at rate R\. This 
establishes the desired result for the high priority stream. 



B. The low priority streams 

Let Qh,Ql be the steady-state queue lengths for the high and low priority queues respectively. Similarly let 
Dh-, Dl be the delays experienced in the high and low priority queues. 

P(D L >d) = 

< 

< 

Where the final inequality comes from realizing that the combined queue-length is the same as the queue-length 
for a single bitstream arriving with the sum-rate. The last equality comes from plugging in the definition of phl 
from (|9]) into (fT8l) . The delay exponent seems to be asymptotically governed by phlRl- 

The next observation is that the true queue-length exponent must be monotonically decreasing in rate Rl since 
increasing the rate of low-priority message-bit arrivals can only make the low-priority queue get longer. This allows 
us to optimize the above bound over all R' L > R L . Choose R' L = - R H where p < puh- This ranges R' L 

2 This vector advantage in terms of required anytime reliability is even more surprising in light of the performance bounds in terms of 
rate only. [6] gives explicit bounds on the squared-error performance using sequential distortion-rate theory. Suppose the A = 8 scalar plant 
was driven by a standard iid Gaussian disturbance while the vector plant was diagonal and driven by three iid Gaussians each of variance 
k. For a given rate R (in bits), the sequential distortion-rate bound on JTt | 2 ] is _} 3 -n for the scalar system while it is J— ^ for 

l-4 1 ~"3 

the vector system. For a given rate, the second-moment performance of the vector system is worse than the scalar one. For example, at rate 
4 the scalar one gets to ~ 1.33 while the vector one is ~ 2.70. At high rates, the two approach each other in terms of second-moment 
performance but the anytime-reliability requirements for the scalar system remain much higher. 



P{Ql > R L d) 
P(Ql + Qh> R L d) 
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from Rl up to 1 — (3 — Rh- The sum rate is E °^ p ' and thus puv = P- So the lower-bound on the asymptotic delay 
error-exponent for the low-priority bits becomes 

max Phl'R'j = max p( — Rh) 

Rl<R' l <1-I3-Rh P<Phl p 

= max Eq(p) — pRu- 

P<Phl 

It is immediately obvious from [14] that this can be no higher than the sphere-packing bound at Rh with equality 
possible if the sphere-packing bound at Rh occurs with a p > phl- D 
It turns out that this bound on the low-priority exponent is tight whenever it hits the sphere -packing bound since 
the sphere-packing bound governs the tail of the inter-renewal times for the high-priority queue. 

Appendix II 
Proof of Lemma [8TI 

Assume all the rates Ri = R for simplicity. First write the expression corresponding to equation (5) in [1] for 
the states i < n. [1] tells us that (2 + ej) = A H > and so the virtual uncontrolled state Xt(i) = 

/[Rt] 
\fc=0 

+ ^A"«pi(L-RtJ -k)Si-i(k) 

\k=0 

+ h X~np n -i{[Rt\ - k)S n (k) \ (19) 

where the p^ represent polynomials that depend on the A matrix. The key feature of polynomials is that for every 
e, it is possible to choose a constant K{ > so that pk{r) < Ki2 eT . The maximum possible deviation is bounded 
by considering the case in which an error is made on all the bits after a certain point t — d since the worst case is 
when every bit that could be wrong is wrong. 

In that worst case, the magnitude of the deviation in Dj due directly to decoding errors is given by: 

k=\R(t-d)] 
[Rt\ 

k=\R(t-d)] 

[Rt\ 
k=\R(t-d)] 

oo 

< 2Kj2 ( - Re+l °^ A)i 2 -fl(i-d)(e+i^) 2 -k(e+^) 

k=0 

= K l2{Rte+tlog 2 X)-{Rt-Rd){e+ l -^22l) 
= ^'2 d ( 1+ T^IT) lo ^ A 

Since e was arbitrary, choose it so e' = D 



18 



AA 



0- 



Encode virtual control Ut 



Window known to contain Xt 

will grow by factor of A > 1 due to the dynamics 
R bits cut window by a factor of 2~ R 

i 

grows by ^ on each side 
giving a new window for Xt+i 



Fig. 6. Virtual controller for R=l. How the virtual state X evolves. 



Appendix III 
The virtual controlled process 

This section is here for the convenience of the reviewers. It is a copy of the relevant section from Part I 
of this paper. It will be dropped in the final version of this correspondence. 

The observer that has access to the state and knowledge of the controls can reconstruct the driving noise Wt 
since Wt = X t +i — XX t — Ut- Thus, it has access to the uncontrolled process 



x t+1 = xx t + w t . 



(20) 



The observer acts as though it is working with a virtual controller through a noiseless channel of finite rate R 
in the manner. The resulting bits are sent through the anytime code. The controller attempts to make the true state 
behave like the virtual controlled state by constantly correcting for any erroneous controls that it might have applied 
in the past due to tentative bit errors made by the anytime decoder. 

The observer is constructed to keep the state uncertainty at the virtual controller inside a box of size A by using 
bits at the rate R. It does this by simulating a virtual process X t governed by: 



X t+1 = xx t + w t + u t 



(21) 



where the Ut represent the computed actions of the virtual controller. This gives rise to a virtual undisturbed process 



XXV + U t 



that satisfies the relationship X t = X t + X¥. The goal is to keep X t within a box [- 



A Ai 



(22) 

and thereby keep 



-XF close to X t . 



Because of the rate constraint, the virtual control Ut takes on one of 2l R (* +1 )J~L Ri J values. For simplicity of 
exposition, ignore the integer effects and consider it to be one of 2 R values and proceed by induction. Assume that 
X t is known to lie within [— y, 4]- Then XX t will lie within [— 4^, 4^]. By choosing 2 R control values uniformly 



spaced within that interval, it is guaranteed that XXt + Ut will lie within 
disturbed by Wt an d so X t +i will be known to lie within 



AA AA 



AA 

■ 2 n+i 



2 h+i ) 2 R + 1 ' 
2 ; 2 R + 1 ' 2-1" 



n aa 



Since the initial condition has no uncertainty, induction will be complete if 

A 



2* 



A + n < a 



Finally, the state will be 



(23) 



To get the minimum A required as a function of R, we can solve for (l23l being an equality. This occurs when 
A = 1 _^2- R f° r ever y case where R > log 2 A. Since the slope ^ on the left hand side of d23l is less than 1, any 
larger A also works. 
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